Pattern-based Aggregation of Named Entity Extractors

نویسنده

  • T. Lemmond
چکیده

Despite significant advances in named entity extraction technologies, state-of-the-art extraction tools achieve insufficient accuracy rates for practical use in many operational settings. However, they are not all prone to the same types of error, suggesting that substantial improvements may be achieved via appropriate combinations of existing tools, provided their behavior can be accurately characterized and quantified. In this paper, we present an inference framework that leverages the joint characteristics of their error processes via a pattern-based representation of extracted entity data. This approach has been shown to produce statistically significant improvements in entity extraction relative to standard performance metrics and to mitigate the weak performance of entity extractors operating under suboptimal conditions. Moreover, this aggregation methodology provides a framework for quantifying uncertainty in extracted entity output, and it can readily adapt to sparse data conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A combined Approach to Arabic Named Entity recognition Using SVM and Pattern Extracted method applied to Topic Detection

Named Entity Recognition (NER) is a clue task for automatic text processing that is required in a wide variety of applications. NER techniques range from handcrafted rules to machine learning approaches. In this paper, we describe the development and implementation of an Arabic Named Entity Recognition (ANER) System, based on machine learning approach. We used SVM classifier with a set of depen...

متن کامل

Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors (pp. 29-38)

This paper proposes feature augmentation methods using unlabeled data and several Named Entity (NE) extractors. We collect NE-related information of each word (which we call NE-related labels) from unlabeled data by using NE extractors. NE-related labels which we collect include candidate NE class labels of each word and NE class labels of co-occurring words. To accurately collect the NE-relate...

متن کامل

ROSeAnn: Reconciling Opinions of Semantic Annotators

Named entity extractors can be used to enrich both text and Web documents with semantic annotations. While originally focused on a few standard entity types, the ecosystem of annotators is becoming increasingly diverse, with recognition capabilities ranging from generic to specialised entity types. Both the overlap and the diversity in annotator vocabularies motivate the need for managing and i...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

NER-FL: A Novel Named Entity Recognizer of Farsi Language using the Web-Based Natural Language Processors and Semantic Annotations

Named Entity Recognition is a main task in the NLP area that has yielded multiple web-based natural language processors gaining popularity in the Semantic Web community for extracting knowledge from web data. These processors are generally located as pipelines, using dedicated APIs and various taxonomy for extracting, classifying and disambiguating named entities. In this paper, we address the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011